NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Conservative and Adaptive Penalty for Model-Based Safe Reinforcement Learning

https://doi.org/10.1609/aaai.v36i5.20478

Ma, Yecheng Jason; Shen, Andrew; Bastani, Osbert; Dinesh, Jayaraman (June 2023, Proceedings of the AAAI Conference on Artificial Intelligence)

Full Text Available
Versatile Offline Imitation from Observations and Examples via Regularized State-Occupancy Matching

Yecheng Jason Ma; Andrew Shen; Dinesh Jayaraman; Osbert Bastani (January 2023, Proceedings of the 39 th International Conference on Machine Learning)

We propose State Matching Offline Distribution Correction Estimation (SMODICE), a novel and versatile regression-based offline imitation learning (IL) algorithm derived via state-occupancy matching. We show that the SMODICE objective admits a simple optimization procedure through an application of Fenchel duality and an analytic solution in tabular MDPs. Without requiring access to expert actions, SMODICE can be effectively applied to three offline IL settings: (i) imitation from observations (IfO), (ii) IfO with dynamics or morphologically mismatched expert, and (iii) example-based reinforcement learning, which we show can be formulated as a state-occupancy matching problem. We extensively evaluate SMODICE on both gridworld environments as well as on high-dimensional offline benchmarks. Our results demonstrate that SMODICE is effective for all three problem settings and significantly outperforms prior state-of-art.
more » « less
Full Text Available
Offline Goal-Conditioned Reinforcement Learning via f-Advantage Regression

Yecheng Jason Ma; Jason Yan; Dinesh Jayaraman; Osbert Bastani (January 2022, 36th Conference on Neural Information Processing Systems (NeurIPS 2022))

Offline goal-conditioned reinforcement learning (GCRL) promises general-purpose skill learning in the form of reaching diverse goals from purely offline datasets. We propose Go al-conditioned f - A dvantage R egression (GoFAR), a novel regression-based offline GCRL algorithm derived from a state-occupancy matching perspective; the key intuition is that the goal-reaching task can be formulated as a state-occupancy matching problem between a dynamics-abiding imitator agent and an expert agent that directly teleports to the goal. In contrast to prior approaches, GoFAR does not require any hindsight relabeling and enjoys uninterleaved optimization for its value and policy networks. These distinct features confer GoFAR with much better offline performance and stability as well as statistical performance guarantee that is unattainable for prior methods. Furthermore, we demonstrate that GoFAR's training objectives can be re-purposed to learn an agent-independent goal-conditioned planner from purely offline source-domain data, which enables zero-shot transfer to new target domains. Through extensive experiments, we validate GoFAR's effectiveness in various problem settings and tasks, significantly outperforming prior state-of-art. Notably, on a real robotic dexterous manipulation task, while no other method makes meaningful progress, GoFAR acquires complex manipulation behavior that successfully accomplishes diverse goals.
more » « less
Full Text Available
Offline Goal-Conditioned Reinforcement Learning via f-Advantage Regression

Yecheng Jason Ma; Jason Yan; Dinesh Jayaraman; Osbert Bastani (January 2022, 36th Conference on Neural Information Processing Systems (NeurIPS 2022))

Offline goal-conditioned reinforcement learning (GCRL) promises general-purpose skill learning in the form of reaching diverse goals from purely offline datasets. We propose $$\textbf{Go}$$al-conditioned $$f$$-$$\textbf{A}$$dvantage $$\textbf{R}$$egression (GoFAR), a novel regression-based offline GCRL algorithm derived from a state-occupancy matching perspective; the key intuition is that the goal-reaching task can be formulated as a state-occupancy matching problem between a dynamics-abiding imitator agent and an expert agent that directly teleports to the goal. In contrast to prior approaches, GoFAR does not require any hindsight relabeling and enjoys uninterleaved optimization for its value and policy networks. These distinct features confer GoFAR with much better offline performance and stability as well as statistical performance guarantee that is unattainable for prior methods. Furthermore, we demonstrate that GoFAR's training objectives can be re-purposed to learn an agent-independent goal-conditioned planner from purely offline source-domain data, which enables zero-shot transfer to new target domains. Through extensive experiments, we validate GoFAR's effectiveness in various problem settings and tasks, significantly outperforming prior state-of-art. Notably, on a real robotic dexterous manipulation task, while no other method makes meaningful progress, GoFAR acquires complex manipulation behavior that successfully accomplishes diverse goals.
more » « less
Full Text Available

Search for: All records